1 



|OK TO ENTERAB] 



SYSTEMS FOR EXPRESSING TOXIC PROTEINS, VECTORS 
AND METHOD FOR PRODUCING TOXIC PROTEINS 



FIELD OF THE INVENTION 

The present invention relates to systems for expressing 
toxic proteins, to expression vectors comprising one of 
these systems, to prokaryotic cells transformed with 
these systems, and also to a method for synthesizing a 
toxic protein using these expression systems. 

It enables, for example, the overproduction in a 
prokaryotic cell, for example Escherichia coli 
(E. coli), of toxic hydrophobic proteins or peptides, 
for example the overproduction of transmembrane domains 
of viral envelope proteins. 

It finds many applications in particular in research 
concerning the mechanisms of viral infections, and in 
the search for and development of novel active 
principles for combating viral infections. 

In the description which follows, the references 
between sguare brackets [ ] refer to the attached 
reference list. 

BACKGROUND OF THE INVENTION 

Determination of the three-dimensional (3D) structure 
is a decisive step in the structural and functional 
understanding of proteins. 

Very great efforts and means have been, and are being, 
used to achieve this aim, and have been amplified with 
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the accumulation of data provided by the genome 
sequencing programmes [1] . 

The two main techniques for establishing these protein 
structures are X-ray diffraction, carried out using 
crystallized proteins, and nuclear magnetic resonance 
(NMR) carried out using proteins in solution. NMR, 
which is very suitable for studying proteins with a 
molecular mass of less than 20 kDa, requires however, 
like X-ray diffraction, the production of large amounts 
of material. It also means, in most cases, that 
material enriched in 15 N and/or 13 C must be prepared. 

In this context, the bacterium is a means of production 
that is widely used by the scientific community [2] . 
The overexpression of proteins in bacteria does not, 
however, occur without problems. In fact, it gives rise 
to three situations: 

The first case, which is ideal, is that where the 
protein is overproduced in a form that is correctly 
spatially folded during its synthesis in vivo. This is 
not a rare situation, but neither is it frequent. It 
concerns essentially soluble proteins that are small, 
i.e. approximately 20 to 50 kDa. 

The second case, the most common, is that where 
the protein is overproduced and aggregated in the form 
of inclusion bodies. This concerns polytopic and/or 
large proteins. In this case, the kinetics of folding 
of the protein are clearly slower than its rate of 
biosynthesis. This promotes exposure of the hydrophobic 
regions of the protein, that are normally buried in the 
core thereof, to the aqueous solvent and generates non- 
specific interactions that result in the formation of 
insoluble aggregates. According to the degree of 
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disorder of this folding, the inclusion bodies can be 
solubilized/unf olded under non-native conditions, with 
urea or guanidine. The solubilized protein is then 
subjected to various treatments, such as dialysis or 
dilution, so as to promote, successfully in certain 
cases, a native 3D folding. 

The third case is that where the expression 
engenders a varying degree of toxicity. This goes from 
an absence of expression product if the bacterium 
manages to adapt itself, to death of the bacterium if 
the product is too toxic. It is a case which occurs 
guite freguently and most commonly with membrane 
proteins or membrane protein domains, for instance 
those of the envelope proteins of the hepatitis C virus 
[5] or of the human immunodeficiency virus [6]. 

The problem of toxicity relates essentially to the 
expression of membrane proteins, i.e. proteins having a 
hydrophobic domain. Now, these proteins are of growing 
interest. Firstly, they are relatively numerous since 
the establishment of the various genomes confirms that 
they represent approximately 30% of the proteins 
potentially encoded by these genomes [7]. Secondly, 
they constitute 70% of the therapeutic targets and 
their alteration is the cause of many genetic diseases 
[8] . 

It is therefore essential to develop methods that 
facilitate or allow the expression of such proteins or 
of their membrane portion. 

Efforts have been made in this respect with, for 
example, the development of bacterial strains that 
either show better tolerance to the expression of 
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membrane proteins [9, 10], or have a stricter 
regulation of the mechanism in the expression, as in 
the case of the E. coli strain BL21 (DE3 ) pLysS developed 
by Stratagene. However, these improvements do not make 
it possible to eliminate the toxicity phenomenon in all 
cases, in particular in the expression of hydrophobic 
peptides corresponding to membrane anchors. 

The treatment of hepatitis C currently represents one 
of the major high-stakes areas of medicine. Hepatitis C 
is caused by the hepatitis C virus (HCV) of the family 
of flaviviridae and which specifically infects hepatic 
cells [11]. This virus consists of a positive RNA of 
approximately 9500 bases which encodes a polyprotein of 
3033 residues [13], symbolized in the attached Figure 1 
by the rectangle 1A. This polyprotein is cleaved, after 
expression, by endogenous and exogenous proteases, so 
as to give rise to 10 different proteins. Two of them, 
called El and E2, are glycosylated and form the 
envelope of the virus. They each have membrane domains 
called TM, in particular TME1 for the El protein and 
TME2 for the E2 protein. The cleavage positions that 
generate them are indicated in Figure 1 by arrows with, 
mentioned below, a number which corresponds to the 
position in the polyprotein of the first amino acid of 
seguence resulting from the cleavage. The El and E2 
proteins are symbolized by a rectangle. The white 
portion of each rectangle corresponds to the ectodomain 
(ed) and the shaded domain to the transmembrane region 
(TM) . The primary sequence of the TMs is indicated at 
the bottom of the figure in one— letter-code, with 
numbers corresponding to the position of the amino 
acids in the polyprotein located at the ends of these 
domains. The stars indicate the hydrophobic amino 
acids. These membrane domains or membrane regions of 
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the virus have particular association properties that 
condition the structuring of the viral envelope [12]. 
In this respect, they constitute potential therapeutic 
targets. An understanding of the mechanism of 
association of the virus reguires studies of the 3D 
structure of these domains, in particular by means of 
the abovementioned technigues, which involves producing 
these peptides in abundant amounts, and also preferably 
via the biosynthetic pathway in order to allow 15 N 
and/or 13 C isotope labelling. 

The various El expression trials of the prior art, in 
particular in E. coli [14] [5] or in sf9 insect cells 
infected with baculoviruses [15], have not made it 
possible to overproduce this El protein, in particular 
due to the toxicity induced by its expression, 
including in the "resistant" E. coli BL21 (DE3 ) pLysS 
strains described above. There has been no E2 protein 
overexpression trial in bacteria. These toxicity 
problems are essentially due to the C-terminal region 
of the two proteins, that is rich in hydrophobic amino 
acids which form transmembrane domains that provide the 
anchoring to the membrane of the endoplasmic reticulum. 

There is therefore a real need for a system for 

expressing toxic proteins which does not have the 

drawbacks, and limitations, deficiencies and 
disadvantages of the techniques of the prior art. 

In addition, there is a real need for an expression 
vector comprising such a system for expressing toxic 
proteins, making it possible to carry out a method for 
producing toxic proteins which does not have the 
drawbacks, limitations, deficiencies and disadvantages 
of the technigues of the prior art. 
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SUMMARY OF THE INVENTION 

The aim of the present invention is precisely to 
provide a system for expressing a toxic protein, which 
satisfies, inter alia, the needs indicated above. 

This aim, and others, are achieved, in accordance with 
the invention, by means of an expression system 
characterized in that it comprises successively, in the 
5'-3 r direction, a nucleotide sequence encoding the 
dipeptide Asp-Pro, referred to below as dp sequence, 
and a nucleotide sequence (pt) encoding a toxic protein 
(Pt) . This system will be identified below by: dp-pt . 

DETAILED DESCRIPTION OF THE INVENTION 

According to a particularly preferred embodiment of the 
present invention, the expression system also 
comprises, upstream of the dp sequence, a nucleotide 
sequence (ps) encoding a soluble protein (Ps) . This 
soluble protein may be, for example, glutathione 
S-transf erase (GST) or thioredoxin (TrX) or another 
equivalent soluble protein. This expression system 
according to the invention will be identified below by: 
ps-dp-pt . 

The dp-pt expression system of the present invention, 
which comprises a sequence encoding Asp-Pro (DP in one- 
letter code) placed upstream of the nucleotide sequence 
of the toxic protein, makes it possible, entirely 
unexpectedly, to suppress the toxic effect of the 
protein for the host cell. In addition, the inventors 
have noted that, entirely surprisingly, the suppression 
of toxicity of the protein in the host is even more 
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effective with the ps-dp-pt expression system when the 
toxic peptide is produced as a C-terminal fusion with a 
soluble protein, for example glutathione S-transf erase 
or thioredoxin, with the sequence Asp-Pro inserted 
between the soluble protein and the toxic peptide. 

The dp-pt or ps-dp-pt expression system of the present 
invention makes it possible to overproduce toxic 
proteins in host cells, in particular hydrophobic 
proteins, especially peptides which correspond to, or 
which comprise, hydrophobic domains of membrane- 
anchored proteins which may involve, for example, a 
membrane protein or a domain of a membrane protein. It 
may involve, for example, a protein of a virus, for 
example of a hepatitis C virus, of an AIDS virus, or of 
any other virus that is pathogenic for humans and, in 
general, for mammals. 

For example, the dp-pt or ps-dp-pt system of the 
invention makes it possible to overproduce, in a host 
such as E. coli, the transmembrane domains of the El 
and E2 proteins of the hepatitis C virus, called TME1 
and TME2, corresponding respectively to the sequences: 

TME1 : 3 4 7-MIAGAHWGVLAGIAYFSMVGNWAKVLVVLLLFAGVDA-3 83 
SEQ ID NO: 1 

TME2 : 7 1 7-MEYVVLLFLLLADARVCSCLWMMLL I SQAEA- 7 4 6 
SEQ ID NO: 2 

whereas this was not possible with the techniques of 
the prior art. 

The nucleotide sequences that can be used for 
constituting the dp-pt system of the invention encoding 
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the TME1 (dp-pt ,;tmei) ) or TME2 ( dp-pt (T me2> ) proteins can 
be any of the possible sequences encoding respectively 
the DP-TME1 and DP-TME2 fusion proteins. The sequences 
encoding the TME1 and TME2 proteins may advantageously 
be, for example, SEQ ID NO: 3 and SEQ ID NO: 4, 
respectively, of the attached sequence listing. To 
obtain the dp-pt system, the dp sequence encoding the 
dipeptide Asp-Pro (DP) is added to these sequences. 

The nucleotide sequences that can be used for 
constituting the ps-dp-pt system of the invention 
encoding the TME1 (ps-dp-pt (TME1) ) or TME2 (ps-dp-pt {IKE 2) ) 
proteins may be any of the possible sequences encoding 
the Ps-DP-TMEl and Ps-DP-TME2 fusion proteins, 
respectively. They may advantageously be, for example, 
the sequences ID No. 34, ID No. 35 and ID No. 36 of the 
attached sequence listing for TME1, making it possible 
to obtain a Ps-DP-TMEl chimeric protein. They may 
advantageously be, for example, the sequences ID 
No. 37, ID No. 38 and ID No. 39 of the attached 
sequence listing for TME2, making it possible to obtain 
a Ps— DP-TME2 chimeric protein. 

In fact, the abovementioned nucleotide sequences have 
optimized codons for the expression of TME1 and TME2 in 
a bacterium, for example in E. coll. 

A large number of HCV RNA sequences producing an 
infectious phenotype exist: these sequences can also be 
used in the present invention. 

The sequence encoding the dipeptide Asp-Pro may be, for 
example: gacccg, or any other sequence encoding this 
dipeptide . 
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The sequence encoding GST may be, for example, that 
present in the pGEXKT plasmids, the sequence of which 
corresponds to SEQ ID NO: 29 of the attached sequence 
listing, or any equivalent sequence, i.e. encoding this 
soluble protein. The sequence encoding TrX may be, for 
example, that present in the pET32a+ expression 
plasmid, the sequence of which corresponds to SEQ ID 
NO: 30 of the attached sequence listing, or any 
equivalent sequence, i.e. encoding this soluble 
protein . 

For the production of the toxic protein, the dp-pt or 
ps-dp-pt expression system of the invention is placed 
inside a host cell, for example by cloning in an 
appropriate plasmid, by means of the usual techniques 
for transforming a host in genetic recombination 
techniques . 

The plasmid into which the expression system of the 
present invention may be cloned so as to form this 
vector will be chosen in particular according to the 
host cell. It may be, for example, the pT7-7 plasmid 
(SEQ ID NO: 33 of the attached sequence listing), a 
plasmid of the pGEX series (for example of SEQ ID 
NO: 31 of the attached sequence listing), sold for 
example by the company Pharmacia, or a plasmid of the 
pET32 series (for example of sequence ID No: 32 of the 
attached sequence listing) , sold for example by the 
company Novagen. 

The plasmids of the pGEX series and of the pET32 series 
will advantageously be used for implementing the 
present invention. In fact, they already comprise a ps 
sequence encoding a soluble protein (Ps), respectively 
glutathione S-transf erase and thioredoxin. Thus, 
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advantageously, the dp-pt system will be cloned into 
these plasmids downstream of this ps sequence encoding 
the soluble protein. 

The present invention therefore also relates to an 
expression vector comprising a dp-pt or ps-dp-pt 
expression system according to the invention; in 
particular, a vector comprising a dp-pt expression 
system according to the invention and the 
oligonucleotide sequence of the pT7-7 plasmid, or a 
vector comprising a ps-dp-pt expression system 
according to the invention and the oligonucleotide 
sequence of a pGEX plasmid or of a pET32 plasmid. 

For example, the expression vectors of the present 
invention that are suitable for a bacterial host such 
as E. coli and that allow overexpression of the 
abovementioned TME1 membrane protein may advantageously 
have an oligonucleotide sequence chosen from the 
sequences ID No. 40 (with pGEXKT) , ID No. 42 (with 
pET32a+) and ID No. 44 (with PT7-7) of the attached 
sequence listing. 

For example, the expression vectors of the present 
invention that are suitable for a bacterial host such 
as E. coli and that allow overexpression of the 
abovementioned TME2 membrane protein may advantageously 
have an oligonucleotide sequence chosen from the 
sequences ID No. 41 (with pGEXKT) , ID No. 43 (with 
pET32a+) and ID No. 45 (with pT7-7) of the attached 
sequence listing. 

In fact, the abovementioned expression vectors have 
codons that are optimized for the expression of the 
chimeric proteins of the present invention, including 
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TME1 and TME2, in a bacterium, for example in E. coli. 

The present invention also relates to a prokaryotic 
cell transformed with an expression vector according to 
the invention. This prokaryotic cell transformed with 
the expression vector of the present invention should 
preferably allow overexpression of the toxic protein 
for which the vector codes. Thus, any host cell capable 
of expressing the expression vector of the present 
invention can be used, for example E. coli, 
advantageously the E. coli strain BL2 1 ( DE3 ) pLysS . 

The present invention also relates to a method for 
producing a toxic protein by genetic recombination, 
comprising the following steps: 

transforming a host cell with an expression vector 
according to the invention, 

culturing the transformed host cell under culture 
conditions such that it produces a fusion protein 
comprising the dipeptide Asp-Pro followed by the 
peptide seguence of the toxic protein from said 
expression vector, and 

isolating said fusion protein, and 

cleaving said fusion protein so as to recover the 
toxic protein. 

The steps for transforming, culturing and isolating the 
chimeric protein produced can be carried out by means 
of the usual techniques of genetic recombination, for 
example by means of techniques such as those that are 
described in document [25] . 

The step consisting in isolating the fusion protein can 
be carried out by means of the usual techniques known 
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to those skilled in the art for isolating a protein 
from a cell extract. 

The fusion protein produced by means of the method of 
the invention has a "soluble protein-Asp-Pro-toxic 
protein" sequence. In the present description, the 
dipeptide Asp-Pro is also called DP according to the 
one-letter amino acid code. 

For example, when the toxic protein is TME1, the fusion 
protein may have the SEQ ID NO: 46 of the attached 
sequence listing, which corresponds to the GST-DP-TME1 
fusion protein; the SEQ ID MO: 48 of the attached 
sequence listing, which corresponds to the TrX-DP-TMEl 
fusion protein; or the SEQ ID NO: 50 of the attached 
sequence listing, which corresponds to the M-DP-TME1 
fusion protein of the attached sequence listing. 

For example, when the toxic protein is TME2, the fusion 
protein may have the SEQ ID NO: 47 of the attached 
sequence listing, which corresponds to the GST-DP-TME2 
fusion protein; the SEQ ID NO: 49 of the attached 
sequence listing, which corresponds to the TrX-DP-TME2 
fusion protein; or the SEQ ID NO: 51 of the attached 
sequence listing, which corresponds to the M-DP-TME2 
fusion protein of the attached sequence listing. 

The step consisting of cleavage of this fusion protein 
can advantageously be carried out by means of formic 
acid, which cleaves the fusion protein at the dipeptide 
Asp-Pro. It may be carried out, moreover, by means of 
any appropriate technique known to those skilled in the 
art for recovering a protein from a sample using a 
fusion protein. 
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The inventors are the first to have found a system that 
is really effective for producing and even 
overproducing, in particular in the Escherichia coli 
(E. coli) bacterium, hydrophobic peptides corresponding 
to the membrane domains of the El and E2 proteins of 
the hepatitis C virus envelope, the expression of which 
is lethal for the microorganism. 

The field of application of the present invention 
concerns mainly the production of hydrophobic peptides 
on a large scale, in particular for fundamental and 
industrial research. In addition, the production of the 
chimeric protein consisting of the soluble protein, of 
the dipeptide Asp-?ro and of the hyrophobic peptide can 
be used for a functional purpose, in particular for 
obtaining information on the degree of oligomerization 
of the membrane domain or else on its 
heteropolymerization capacity. 

The fusion proteins, or chimeric proteins, are produced 
via their coding DNA present, for example, in 
commercial plasmids and following which is introduced, 
in phase, the DNA encoding the Asp-Pro sequence 
followed by that encoding the toxic peptide. This 
application can be commercialized in the form of 
bacterial expression plasmids which will include the 
sequence of the Asp-Pro site, downstream of that of the 
soluble proteins already present. The corresponding 
plasmid will be described, for example, as a tool that 
facilitates the production, via the biological pathway, 
of toxic membrane peptides or proteins. 

Thus, the present invention is applicable to any system 
for overexpressing recombinant proteins, with or 
without fusion to a soluble protein such as, for 
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example, GST or thioredoxin, including a non-natural 
Asp-Pro sequence inserted upstream of a sequence 
encoding a toxic domain of the protein, for example a 
membrane domain of a protein. 

Other characteristics and advantages of the present 
invention will become further apparent to those skilled 
in the art on reading the following examples given by 
way of non-limiting illustration, with reference to the 
sequence listing and to the figures that are attached. 

Brief description of the attached sequence listing 

SEQ ID NOS: 1 and 2: peptide sequences of TME1 and 
of TME2, respectively. 

SEQ ID NOS: 3 and 4: sequences encoding the TME1 
peptide and the TME2 peptide, respectively. 
SEQ ID NOS: 5 and 6: respectively, oligonucleotide 
( + ) for insertion into pT7-7 (0L13( + )) and 
oligonucleotide (-) for insertion into pT7-7 
(0L14(-) ) . 

SEQ ID NOS: 7 and 8: respectively, coding sense 
DNA of TME1 + cla I site in the 3' position and 
anticoding sense DNA of TME1 + cla I site in the 
5' position (sequence complementary to the SEQ ID 
NO: 7) . 

SEQ ID NOS: 9 and 10: respectively, coding sense 
oligonucleotide (0L11(+)) and anticoding sense 
oligonucleotide (OL12(-)) for the synthesis of 
TME1 . 

SEQ ID NO: 11: oligonucleotide ( + ) for insertion 
into pGEXKT without dp site (OL15(+)). 
SEQ ID NO: 12: oligonucleotide ( + ) for insertion 
into pGEXKT with dp site (0L17(+)). 

SEQ ID NO: 13: oligonucleotide (-) for insertion 
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into pGEXKT (0L16 (-) ) . 

SEQ ID NO: 14: oligonucleotide ( + ) for insertion 
into pET32a (OL18 (+) ) (hybridizes to the segment 
915-932 of pGEXKT) . 

SEQ ID NOS: 15 and 16: respectively, 
oligonucleotides (+) (OL19(+)) and (-) (OL20(-)) 
for insertion into pT7-7 of the DNA encoding 
MDP-TME 1 . 

SEQ ID NOS: 17 and 18: respectively, 
oligonucleotide (+) for insertion into pT7-7 
(OL23(+)) and oligonucleotide (-) for insertion 
into pT7-7 (OL24 (-) ) . 

SEQ ID NOS: 19 and 20: respectively, coding sense 
DNA for TME2 + Nde I site in the 5' position and 
Hind III site in the 3' position; and anticoding 
sense DNA of TME2 + Nde I site in the 3' position 
and Hind III site in the 5' position (sequence 
complementary to ID No. 17) . 

SEQ ID NOS: 21 and 22: respectively, coding sense 
oligonucleotide (OL21(+)) and anticoding sense 
oligonucleotide (OL22(-)) for the synthesis of 
TME2 . 

SEQ ID NO: 23: oligonucleotide ( + ) for insertion 

into pGEXKT without dp site (OL25(+)). 

SEQ ID NOS: 24 and 25: respectively, 

oligonucleotides (+) (OL27(+)) and (-) (OL26(-)) 

for insertion into pGEXKT with dp site. 

SEQ ID NOS: 26 and 27: respectively, 

oligonucleotides (+) (OL28(+)) and (-) (OL29(-)) 

for insertion into pT7-7 of the DNA encoding 

MDP-TME 2 . 

SEQ ID NO: 28: end of the sequence of the GST 
soluble protein followed by the thrombin site 
encoded in the pGEXKT plasmid. 

SEQ ID NO: 29: DNA encoding the GST protein in the 
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pGEXKT plasmid. 

SEQ ID NO: 30: DNA encoding thioredoxin (TrX) in 
the pET32a+ plasmid. 

SEQ ID NOS: 31, 32 and 33: respectively, pGEXKT, 
pET32a+ and pT7-7 expression plasmids. 
SEQ ID NOS: 34, 35 and 36: respectively, 
expression systems according to the invention 
encoding the GST— DP-TME1 , TrX-DP-TME 1 and 
M-DP-TME1 fusion proteins. 

SEQ ID NOS: 37, 38 and 39: respectively, 
expression systems according to the invention 
encoding the GST— DP-TME2 , TrX-DP-TME2 and 
M-DP-TME2 fusion proteins. 

SEQ ID NOS: 40 and 41: respectively, 
pGEXKT- dp-ptiMEi and pGEXKT-dp-pt IM E2 expression 
vectors according to the invention encoding the 
GST-DP-TME1 and GST-DP-TME2 fusion proteins. 
Sequences ID No. 42 and 43: respectively, 
pET32a-dp-pt T MEi and pET32a-dp-pt m E2 expression 
vectors according to the invention encoding the 
TrX-DP-TMEl and TrX-DP-TME2 fusion proteins (code 
via the complementary strand) . 

SEQ ID NOS: 44 and 45: respectively, 
pT7-7-dp-ptiy_ E : and pT7-7-dp-ptiy_ E 2 expression 
vectors according to the invention encoding the 
MDP-TME 1 and M-DP-TME2 fusion proteins. 
SEQ ID NOS: 46 and 47: respectively, GST-DP-TME1 
and GST-DP-TME2 fusion proteins according to the 
invention obtained from the pGEXKT- dp-pt TM Ei and 
pGEXKT- dp-pt tve2 plasmids. 

SEQ ID NOS: 48 and 49: respectively, TrX-DP-TMEl 
and TrX-DP-TME2 fusion proteins according to the 
invention obtained from the pET32a-dp-pt TME i and 
pET32a-dp-pt T ME2 plasmids. 

SEQ ID NOS: 50 and 51: respectively, M-DP-TME 1 and 
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M-DP-TME2 fusion proteins according to the 
invention obtained from the pT7-7-dp-pt TM Ei and 
pT7-7-dp-pt TK E2 plasmids. 

SEQ ID NOS: 52 and 53: respectively, GST and TrX 
proteins encoded by the pGEXKT and pET32a+ vector. 

BRIEF DESCRIPTION OF THE FIGURES 

Figure 1: diagrammatic representation of a portion 
of the HCV polyprotein and peptide sequence of the 
C-terminal membrane domains of the El and E2 envelope 
proteins. The peptide sequences represented correspond 
to the infectious type #D00831 and #M67463 for TME1 
(SEQ ID NO: 1) and I ME 2 (amine acids 2-31 of SEQ ID NO: 
2), respectively, obtained from the public sequence 
library of the European Molecular Biology Laboratory 
(EMBL) . 

Figure 2: creation of the DNA encoding the 
C-terminal membrane domain of the HCV El envelope 
protein and additional sequences in the 5' and 3' 
positions for cloning in various plasmids. The 
sequences represented in this figure are reported in 
the attached sequence listing. In particular, Figure 2A 
shows SEQ ID NO: 1; Figure 2B shows, from top to 
bottom, SEQ ID NOS: 1 and 3; Figure 2C shows, from top 
to bottom, SEQ ID NOS: 5, 9, 10, and 6; and Figure 2D 
shows, from top to bottom, SEQ ID NOS: 9, 10, 5, 6, 11, 
12, 13, 14, 13, 15 and 16. 

Figure 3: creation of the DNA encoding the 
C-terminal membrane domain of the HCV E2 envelope 
protein and additional sequences in the 5' and 3' 
positions for cloning in various plasmids. The 
sequences represented in this figure are reported in 
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the attached sequence listing. In particular, Figure 
3A shows SEQ ID NO: 2; Figure 3B shows, from top to 
bottom, SEQ ID NOS: 2 and 4; Figure 3C shows, from top 
to bottom, SEQ ID NOS: 17, 21, 22, and 18; and Figure 
3D shows, from top to bottom, SEQ ID NOS: 21, 22, 17, 
18, 23, 24, 25, 14, 25, 26 and 27. 

Figure 4, panels A to F: toxicity of the membrane 
domains expressed in the bacterium and suppression of 
this toxicity by insertion of a dp site. Panels A, C 
and E are graphic representations of optical density 
(OD) measurements at 600 nm as a function of time (t) 
in hours of production of various proteins in a 
bacterium using or not using the expression system of 
the present invention. Panels B, D and F are 
representations of the gels of migration of the 
proteins of panels A, C and E, respectively. 

Figures 5A and B: overexpression of the 
thioredoxin-Asp-Pro-Pt chimeric proteins (Pt = membrane 
domains of the proteins) in the bacterium. Figure 5A is 
a graphic representation of the optical density (OD) 
measurements at 600 nm as a function of the time in 
hours of production of various proteins in a bacterium 
using or not using the expression system of the present 
invention: Figure 5B is a representation of a gel of 
migration of the proteins of Figure 5A. 

Figure 6: expression and purification of the 
GST-TME2 fusion (or chimeric) protein, and comparison 
with GST alone. This figure represents, at the top, the 
peptide sequences of GST (SEQ ID NO: 52) and GST-TME2, 
and, at the bottom, the gels obtained by 
electrophoresis, showing that, unlike GST alone, 
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GST-TME2 is insoluble. The latter is produced in the 
form of inclusion bodies that cannot fold correctly. 

Figures 7A and 7B: graphic representations of 
comparative experimental results showing the effect of 
the DP dipeptide (dp-pt oligonucleotide seguence in 
accordance with the present invention) and of the DP 
dipeptide and the soluble protein (ps-dp-pt 
oligonucleotide sequence in accordance with the present 
invention) on the synthesis of the TME1 and TME2 toxic 
proteins in accordance with the present invention. 

EXAMPLES 

In these examples, the oligonucleotides used were 
ordered from Laboratoires EUROBIO; the plasmids were 
prepared with the QIAprep kit (brand name) from Qiagen; 
the DNA sequences were sequenced with the ABI PRISM 
(registered trade mark) BigDye (brand name) Terminator 
cycle kit from Applied Biosystems; the E. coli strains 
BL2 1 (DE3 ) and BL21 (DE3 ) pLysS were obtained from 
Stratagene; the C41 and C43 (BL21(DE3)) strains were 
provided by Dr. Bruno Miroux ( CNRS-CEREMOD, Centre for 
Research on molecular endocrinology and development; 
the DNA restriction and modification enzymes were 
obtained from New England Biolabs; the protein 
electrophoreses were carried out with a miniprotean 3 
(brand name) from Bio-Rad Laboratories; the plasmid pCR 
(registered trade mark) T7 topo TA was obtained from 
Invitrogen; the pET32a+ plasmid was obtained from 
Novagen; the pT7-7 and pGPl-2 plasmids and the K38 
strain [22] were reguested from Prof. Tabor (Department 
of Biological Chemistry, Harvard Medical School) ; the 
pGEX-KT plasmid was requested from Prof. Dixon 
(Department of Biological Chemistry, University of 
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Michigan Medical School) ; the other products were 
obtained from Sigma. 

In the following examples, the production of the TME1 
and TME2 peptides was firstly carried out without the 
expression system of the present invention, and then as 
a fusion with a soluble protein and, finally, as a 
fusion with GST with insertion of the Asp-Pro ("DP" in 
one-letter coding) site between the soluble protein and 
TME1 or TME2 . 

The abbreviation "SEQ ID NO:" refers to the attached 
sequence listing. 

Example 1 : Synthesis of the expression system 

1.1) CONSTRUCTION OF THE pT7-7-pt IME1 and pT7-7-pt ly , E2 

EXPRESSION VECTORS 

The DNA encoding the two domains was synthesized 
de novo using the appropriate oligonucleotides. The 
codons were chosen according to their greatest 
frequency of use in the bacterium, as was quantified by 
Sharp et al. [17]. The constructs are described in the 
attached Figure 2 for TME1 and in the attached Figure 3 
for TME 2 . 

Each synthetic DNA was generated using a set of two 
long and overlapping oligonucleotides, OL11 (SEQ ID 
NO: 9) and OL12 (SEQ ID NO: 10) for TME 1 , and OL21 (SEQ 
ID NO: 19) and OL22 (SEQ ID NO: 20) for TME2, which 
were amplified after hybridization with two external 
oligonucleotides chosen according to the cloning in a 
given plasmid. Thus, the clonings in pT7-7 were carried 
out using the set of external oligonucleotides OL13 
(SEQ ID NO: 5) and OL14 (SEQ ID NO: 6) for TME1, and 
OL23 (SEQ ID NO: 15) and OL24 (SEQ ID NO: 16) for TME 2 . 
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Each synthetic DNA was generated using a set of four 
oligonucleotides: two long and overlapping and two 
short and external. The DNAs were amplified by the 
polymerase chain reaction method, referred to as "PCR" 
[18], and then cloned into a bacterial plasmid pCR 
(brand name) T7 topo TA. The synthesized DNAs were 
sequenced and then subcloned into the pT7-7 bacterial 
expression vector [19] using the Nde I restriction site 
in the 5' position and the Cla I or Hind III 
restriction site in the 3' position. 

In Figure 2 : 

A: TME1 peptide sequence of subtype #D00831. The 
numbering corresponds to the position of the sequence 
in the polyprotein as described in Figure 1. 
B: DNA sequence encoding the membrane domain with 
optimized codons for expression in the bacterium. 
C and D: Strategy for DNA amplification without 
matrix. The coding sense and the anticoding sense of 
the oligonucleotides are indicated, respectively, by 
the signs (+) and (-) . The long oligonucleotides 
overlap by about twenty bases so as to create the 
primer and then the matrix. The short oligonucleotides 
make it possible to amplify the matrix by PCR, 
integrating the desired restriction sites according to 
the plasmids used. The insertion into pT7-7 was carried 
out with the pair of oligonucleotides 0L13 (SEQ ID 
NO: 5) and 0L14 (SEQ ID NO: 6), via a subcloning in 
pCRT7 topo, integrating the Nde I and Hind III sites. 
The insertion into pGEXKT was carried out according to 
the same method, with the pair of oligonucleotides 0L15 
(SEQ ID NO: 11) and 0L16 (SEQ ID NO: 13), integrating 
the BamH I and EcoR I sites. The insertion of the dp 
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site (gacccg) and the cloning in pGEXKT were carried 
out with the pair of oligonucleotides 0L17(SEQ ID 
NO: 12) and 0L16 (SEQ ID NO: 13) . The construct in 
pGEXKT was transferred into pET32a, which encodes 
thioredoxin, with the pair of oligonucleotides 0L18 

(SEQ ID NO: 14) and OL16 (SEQ ID NO: 13) . The 
oligonucleotide 0L18 (SEQ ID NO: 14) hybridizes in the 
terminal region of the DNA encoding GST in pGEXKT. The 
amplified sequence integrates the end of GST 

(SDLSGGGGG) followed by the thrombin site (LVPRGS) (SEQ 
ID NO: 28), by the DP site and by the membrane passage. 
After cloning, the DNA inserted into pET32a makes it 
possible to express the thioredoxin- 

SDLSGGGGGLVPRGS-DP-TME 1 chimera (SEQ ID NO: 48). 

In Figure 3 : 

The legend is identical to Figure 2, but the peptide 
sequence is that of subtype #M67463. The insertion into 
pT7-7 was carried out with the pair of oligonucleotides 
OL23 and OL24 (SEQ ID NO: 17 and SEQ ID NO: 18, 
respectively) , via a subcloning in pCRT7 topo, 
integrating the Nde I and Hind III sites. 

The insertion into pGEXKT was carried out according to 
the same method, with the pair of oligonucleotides OL25 
and OL26 (SEQ ID NO: 23 and SEQ ID NO: 25, 
respectively), integrating the BaniH I and EcoR I sites. 
Insertion of the dp site (gacccg) and the cloning in 
pGEXKT were carried out with the pair of 
oligonucleotides OL27 and OL26 (SEQ ID NO: 24 and SEQ 
ID NO: 25, respectively). Insertion into pET32a was 
carried out as described in Figure 2, using the pair of 
oligonucleotides 0L18 and OL26 (SEQ ID NO: 14 and SEQ 
ID NO: 25, respectively) . 
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1.2) CONSTRUCTION OF THE pGEXKT-pt TME1 , pGEXKT-pt IME2 , 
pGEXKT-dp-pt T .MEi AND pGEXKT-dp-pt TME 2 EXPRESSION VECTORS 

The pGEXKT-pt TM Ei and pGEXKT-pt IF _E2 expression vectors 
were constructed by PCR as described in the attached 
Figures 2 and 3. The matrix DNA used to amplify the 
DNAs encoding TME1 or TME2 is that cloned into the 
pT7-7 plasmids. The cloning of TME1 into the pGEXKT 
plasmid [20, 21] was carried out using the sets of 
oligonucleotides OL15 (SEQ ID NO: 11) and OL16 (SEQ ID 
NO: 13) allowing insertion of the BamH I restriction 
site in the 5' position and the EcoR I restriction site 
in the 3' position. The cloning of TME2 into the same 
vector was carried out using the sets of 
oligonucleotides OL25 (SEQ ID NO: 21) and OL26 (SEQ ID 
NO: 23) . 

As indicated in Figure 2, the insertion of the dp site 
at the N-terminal position of TME1 was carried out by 
replacing the 5' oligonucleotide OL15 (SEQ ID NO: 11) 
with the oligonucleotide OL17 (SEQ ID NO: 12) . The 
insertion of the dp site at the N-terminal position of 
TME2 was carried out by replacing the 5' 
oligonucleotide OL25 (SEQ ID NO: 21) with the 
oligonucleotide OL27 (SEQ ID NO: 22), as shown in 
Figure 3. 

1.3) CONSTRUCTION OF THE pET32a-dp-TMEl AND 
pET32a-dp-TME2 EXPRESSION VECTORS 

The pET32a-dp-TMEl and pET32a-dp-TME2 expression 
vectors were constructed by PCR as described in the 
attached Figures 2 and 3, using the set of 
oligonucleotides indicated. The upstream 
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oligonucleotide integrates an EcoR V site and 
hybridizes with the terminal region of the gene 
encoding GST. It makes it possible to integrate the 
5-glycine tail and the thrombin-cleavage site present 
in the plasmid. The downstream oligonucleotide is the 
same as that used for the cloning in pGEXKT . 

The insertion into the pET32a plasmid is carried out 
via the MsC I /EcoR V sites in the 5' position and the 
.EcoR I site in the 3' position. It makes it possible to 
insert, in phase at the end of the thioredoxin 
sequence, the 5-glycine tail, the thrombin-cleavage 
site, the DP site and the membrane passage. The pET32a 
plasmid of origin, which serves as a control, encodes 
thioredoxin followed by a sequence integrating various 
elements that have not been deleted and that 
contribute, to a large degree, to the mass of the 
chimeric protein produced. 

The matrix DNA used to amplify the DNAs encoding TME1 
or TME2 is that cloned into the pGEXKT — dp— ptiMEi or 
pGEXKT-dp-pt T _ME2 plasmids. For TME1, the cloning into 
pET32a+ was carried out using the sets of 
oligonucleotides 0L18 (SEQ ID NO: 14) and 0L16 ( SEQ ID 
NO: 13). The cloning of TME2 into the same vector was 
carried out using the sets of oligonucletides 0L18 (SEQ 
ID NO: 14) and OL26 (SEQ ID NO: 23), as indicated in 
Figure 3. 

Example 2: Expression of sequences encoding the TMEl 
and TME2 proteins alone 

The expression of the sequences encoding the TMEl and 
TME2 domains alone was tested by thermal or chemical 
induction and using various bacterial strains as 
described below. 
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2.1) THERMAL INDUCTION SYSTEM 

The system developed by Tabor [22] makes it possible to 
express a protein by thermal induction using two 
vectors in the same bacterium, pT7-7 and pGPl-2. 

The pT7-7 plasmid contains the DNA to be expressed, 
placed under the control of a (f>10 promoter recognized 
by the T7 phage RNA polymerase. The pGPl-2 plasmid 
contains the gene encoding the T7 phage polymerase, 
placed under the control of a A,p L promoter. This 
promoter is repressed by a thermosensitive repressor, 
CI857, that is itself also present in pGPl-2. At 30°C, 
CI857 is normally expressed and represses the Xp L 
promoter, which blocks the expression of the polymerase 
and therefore also that of the protein of interest. 

The induction is triggered by switching the culture 
from 37 to 42°C for 15-30 min, and then the expression 
continues at 37°C. This system is therefore 
particularly suitable when it is necessary to strictly 
control the expression of a given protein, in 
particular if said protein is toxic for the bacterium. 

2.2 CHEMICAL INDUCTION SYSTEM 

The same pT7-7 plasmid containing the DNA to be 
expressed is this time introduced into E. coli bacteria 
of the type BL21(DE3) (B f" dcm omtP hsdSf r~m~ ) gal X 
( DE3 ) ) and BL21 (DE3 ) pLysS (B F" dcm ompT hsdS (r~m~) gal 
X (DE3) [pLysS Cam 1 ] ) . These bacteria have been 
modified so as to contain in the genome a copy of the 
gene encoding the T7 phage RNA polymerase, placed under 
the control of a lacUV5 promoter that can be induced 
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with isopropyl-l-thio-p-D-galactoside (IPTG). In this 
case, the bacteria are cultured at their optimum 
temperature of 37°C or less if necessary. The 
expression is induced by adding IPTG to the culture. 
The BL21 (DE3 ) pLysS strain is particularly suitable for 
proteins whose base line expression is toxic for the 
host bacterium. In fact, the presence of the pLysS 
plasmid allows continuous expression, at a low level, 
of T7 phage lysozyme. This inhibits the T7 phage 
polymerase, the weak expression of which in the absence 
of induction could allow the base line expression of 
toxic protein. 

The inventors also tested the expression of the 
membrane domains alone in strains called C41 and C43 
[10], which were selected so as to withstand the 
expression of toxic membrane proteins. These strains 
are derived from the BL21(DE3) strain and are used in 
the same way as the latter. 

2.3) EXPRESSION TESTS 

According to the system tested, the corresponding 
plasmids were introduced by transformation into the 
various strains of E. coli: K38 (HfrC X) for the Tabor 
thermal induction system or the various BL21 strains 
for the chemical induction. Table 1 below summarizes 
the tests performed. 



Table 1 



Induction 


Strain 


Plasmid 


Thermal 


K38 


pT7-7+pGPl-2 


Chemical 


BL21 (DE3) 


pT7-7 


Chemical 


BL21 (DE3)pLysS 


pT7-7 


Chemical 


C41 (BL21 (DE3) ) 


pT7-7 
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| Chemical | C43 (BL21 (DE3 ) ) T pT7-7 

In each case, about ten transf ormants were placed in 
culture in order to test the expression. Briefly, the 
bacteria were cultured in 5 ml of LB (10 g tryptone, 
5 g yeast extract, 5 g NaCl, qs 1 litre H 2 0) , 
supplemented with 50 ug/ml of ampicillin (necessary in 
order to maintain pT7-7 in the bacterium) and 60 ug/ml 
of kanamycin (necessary in order to maintain pGPl-2 in 
the bacterium) , and then cultured until saturation, 
either at 30°C for K38 or at 37°C for BL21(DE3). The 
cultures were then diluted to 1/10 in the same culture 
medium and cultured to an optical density (OD) of 1, 
measured at 600 nm on a Philips PU8740 
spectrophotometer (brand name) . 

The expression was then induced either thermally (K38) 
at 42°C for 15 min, or chemically (BL21(DE3)) by adding 
1 mM IPTG. It was continued for 3-5 hours at 37°C. The 
OD 6 ocnm of the cultures was measured at various times. 

At the end of the expression, a volume of culture 
containing the equivalent of 0 . 1 OD of bacteria was 
removed. The bacteria were harvested by centr if ugation 
and suspended in 50 ul of lysis solution (LS: 50 mM 
Tris-Cl, pH 8.0, 2 . 5 mM EDTA, 2% SDS, 4 M urea, 0.7 M 
|3-mercaptoethanol ) . After a few minutes at ambient 
temperature, 10 ul were loaded onto a 16.5% 
polyacrylamide gel for "Tricine" type electrophoresis 
[23], which makes it possible to obtain good separation 
of low molar mass proteins. 



In Figure 4 : 

Panels A, C and E: The bacteria were transformed with 
the plasmids pT7-7. pT7-7-TMEl, pT7-7-TME2 (panel A), 
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pGEXKT, pGEXKT-TME 1 , pGEXKT-TME2 (panel C) , and pGEXKT- 
dp-TMEl and pGEXKT-dp-TME2 (panel E), and then cultured 
and induced as described above. The bacterial growth 
was followed by measuring the increase in turbidity of 
each culture by measuring the optical density at 600 nm 
as a function of the time in hours. 

Panels B, D, F: The bacteria were sampled at the time 
indicated in the text and treated as described above. 
They were then deposited onto an electrophoresis gel, 
either 16.5% acrylamide of the "Tricine" type (panel 
B) , or 14% acrylamide of the Laemmli SDS-PAGE type 
(panels D and F) . The electrophoresis shown in panel F 
migrated for a longer period of time than that shown in 
panel D, in order to improve the separation of the 
bands in the 30 000 Da region. After migration, the 
gels were stained for 10 minutes with Coomassie blue in 
a solution of 40% methanol, 10% acetic acid and 0.1% 
Coomassie blue R250, and then destained in a solution 
of 10% methanol, 10% acetic acid and 1% glycerol. 

Whatever the system tested, the first observation is 
that the frequency of transformation of the bacteria 
was low. For the bacteria that could be selected, the 
result of the expression tests was systematically 
negative. An example is given in Figure 4, panels A and 
B, with the series BL21 (DE3 ) pLysS {[pT7-7], [pT7-7- 
TME1] or [pT7-7-TME2] } . As illustrated by comparing the 
growth curves of panel A of Figure 4, the inventors 
noted, with the clones transformed with pT7-7-TMEl or 
pT7-7-TME2 and resistant on solid medium, that the 
induction stops the bacterial growth virtually 
immediately, unlike the clones containing the plasmid 
alone. Similarly, as can be seen in Figure 4(B), no 
band of proteins migrating in the region corresponding 
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to the molecular mass of the expression products 
(~ 3-4000 Da) or of oligomers thereof ({1, 2, 3, 
etc.}) x molecular mass) can in fact be observed. 

The most probable explanation for this situation is 
that the expression of the membrane domains is very 
toxic for the bacterium. The difficulty in obtaining 
transf ormants implies that a base line expression, even 
very low, is sufficient to kill them. It also shows 
that the pLysS system is not perfect for preventing 
this base line expression. Among the bacteria that 
withstand the transformation step, the induction of 
expression of the hydrophobic domains becomes 
immediately lethal. The systems used effectively make 
it possible to protect the host bacterium against a 
base line expression, but as soon as this expression is 
induced, the toxicity is immediate and the bacteria are 
killed. 

Example 3: Expression of sequences encoding the GST- 
TMEl and GST-TME2 fusion proteins 

The expression vectors were constructed as described in 
Example 1, and then introduced into the BL21 (DE3 ) pLysS 
bacteria. The BL21 (DE3 ) pLysS bacteria were used in the 
interests of comparison with the preceding experiments 
since the expression of GST or of its chimeras does not 
require the DE3-pLysS system. 

The expression was induced with IPTG as for that of the 
domains alone. The characteristics of the proteins 
produced are summarized in Table 2 below. 

Table 2 

| Plasmid | Chimera, | Construct | Size, | Mass | 



3 0 





abbreviation 




aa 


Da 


pGEXKT 


GST, G 


1M-D239 


239 


27469 


pGEXKT-Tl 


GST-TME1, 
GT1 


lM-S 2 33-347M-A 38 3 


269 


30506 


pGEXKT-T2 


GST-TME2, 
GT2 


lM_S23 3-717E-A 74 6 


263 


30191 



The amino acids (aa) are indicated with the one-letter 
code. The numbering of the sequences is done with 
respect to the proteins of origin, GST and viral 
polyprotein. That which refers to the membrane domains 
is indicated in italics. 

Panels C and D of the attached Figure 4 show the 
results obtained. The growth curves for the bacteria 
transformed with the various plasmids show that 
expression of the GT1 and GT2 chimeras is toxic. As can 
be seen on the electrophoresis gel of the Laemmli SDS 
14% PAGE type [24], the expression of TME1 fused to GST 
is accompanied by the absence of a band migrating at 
the expected size of 30 kDa. This implies that a very 
low level of expression of the chimera is sufficient to 
kill the bacteria. On the other hand, the GST-TME2 
chimera is this time visible on the electrophoresis 
gel, in the region of expected molecular mass of 
30 kDa. The level of expression remains limited 
however . 

The protein produced is not soluble despite the 
presence of GST in the fusion. In fact, as shown in the 
attached Figure 6, the solubilization, folding and 
purification trials for the GST-TME2 chimera were a 
failure . 

To obtain the results represented in this Figure 6, the 
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GST and GST-TME2 proteins were expressed as described 
in Figure 4, using 150 ml of culture medium. The 
bacteria were then harvested by centrif ugation and 
suspended (20 mM KP0 4 , pH 7.7, 0.1 M NaCl, 1 mM EDTA, 
1 mM NaN 3 ) so as to have 100 OD/ml. Two ml of each 
culture were removed for sonication with 30 sec pulses 
at an amplitude of 15%. After sonication, a sample is 
taken for electrophoresis. It corresponds to the well 
"To" in Figure 6 (corresponding to the "total"). 

A first low-speed centrif ugation (5000*g, 15 minutes) 
makes it possible to separate the non-ruptured bacteria 
and the inclusion bodies from the soluble or membrane 
proteins. The latter are found in the supernatant and a 
sample is taken. It corresponds to the well "Surn" in 
Figure 6. 

The fraction containing GST alone is then treated with 
an affinity resin that makes it possible to bind and 
then elute specifically this protein (well "At" of the 
GST gel in Figure 6) . 

The fraction containing the non-soluble GST-TME2 
protein is treated either with a mild detergent such as 
triton X100 (TX100), in the presence or absence of 
NaCl, or with a more solubilizing but more 
destructuring detergent such as sarkosyl, before again 
being diluted in TX100 and passed over affinity resin. 

The results in Figure 6 show that GST is present in the 
soluble fraction, unlike the GST-TME2 fusion, which 
indicates that the latter is insoluble. The supernatant 
containing the GST is passed over an agarose-GSH resin 
capable of binding GST. This GST is then eluted with an 
excess of GSH (well marked "Af" of the GST gel in 
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Figure 6 ) . 

The pellet containing the GST-TME2 fusion is not 
solubilized in the presence of a mild detergent such as 
TX100 (with or without added NaCl, well "TX100 +/- 
NaCl" of the GST-TME2 gel), but it can be solubilized 
with a more aggressive detergent such as sarkosyl. 
However, after dilution of the protein thus solubilized 
in TX100, a mild detergent which should favour its 
folding, the protein is not retained on the affinity 
resin, unlike GST, which suggests that the fusion 
protein cannot be folded. 

These tests clearly indicate that the GST-TME2 protein 
is produced in the form of inclusion bodies that cannot 
be correctly folded. 

Example 4: Expression of expression vectors encoding 
the fusion proteins including an Asp-Pro site and a GST 
site 

The construction of the vectors was carried out as 
described above and for the two vectors encoding the 
GST-TME1 and GST-TME2 chimeric proteins, so as to 
produce the vectors encoding the GST-Asp-Pro-TMEl and 
GST-Asp-Pro-TME2 chimeric proteins. They are summarized 
in Table 3 below. 



Table 3 
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Plasmid 


Chimera, 
abbreviation 
Fig. 4 


Construct 


Size, 
aa 


Mass, 
Da 


pGEXKT- 
dp- Tl 


GST-DP- TME1; 
G DP T1 


iM-D 23 3-dp- 

347M-A383 


271 


30718 


pGEXKT- 
dp-T2 


GST-DP- TME2; 
G DP T2 


lM-S 2 33-dp- 

7 i 7 E-A 746 


265 


30403 



The amino acids (aa) are indicated with the one-letter 
code. The numbering of the sequences is done with 
respect to the proteins of origin, GST and viral 
polyprotein. That which refers to the membrane domains 
is indicated in italics. 

The vectors were tested as described in the preceding 
paragraph. The results obtained are shown on panels E 
and F of the attached Figure 4. 

The growth curves for the bacteria transformed with the 
various plasmids show that the expression of the G dp Tl 
and G dp T2 chimeras is clearly less toxic than in the 
previous cases. Panel F shows that, this time, TME1 is 
produced due to the presence of the DP cleavage site. 
Its level of expression, as can be seen in panel F, is 
relatively moderate, but significant. GST-DP-TME2 is 
clearly overproduced. The two proteins migrate in their 
expected molecular mass region. 

The effect of the addition of the DP dipeptide is as 
significant as it is unexpected: it amplifies the 
expression of the domains and suppresses their 
toxicity. This effect of attenuation of the toxicity is 
not known for the DP dipeptide, the only property of 
which that has been reported to date is its ability to 
be cleaved by formic acid. Since the effect is observed 
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on two different peptides that are both initially toxic 
for the bacterium, it is therefore reasonable to think 
that this property may extend to other hydrophobic and 
toxic peptides. 

The inventors verified that the site can be effectively 
cleaved by formic acid: the cleavage is slow and 
requires approximately 7 days at ambient temperature. 

The assays of expression at low temperature (20°C) 
overnight of these chimeras made it possible to 
demonstrate that they are produced in native form. In 
fact, it is possible to detect GST transferase activity 
in the membrane fraction of the bacteria. In addition, 
this activity is measured in solution when the 
membranes are solubilized in the presence of a non- 
ionic detergent such as (3-D-dodecylmaltoside, after 
centrif ugation . 

Example 5: Expression of expression vectors encoding 
the fusion proteins including an Asp-Pro site and a 
site encoding thioredoxin (TrX) 

The pET32a-TrX, pET32a-TrX-dp-TMEl and pET32a-TrX-dp- 
TME2 expression vectors were constructed as described 
above and were then introduced into BL21 (DE3 ) pLysS 
bacteria. The BL21 (DE3 ) pLysS bacteria were used in the 
interests of comparison with the previous experiments 
since the expression of GST or of its chimeras does not 
require the DE3-pLysS system. The positive clones were 
cultured and induced as described above. 

The induction of expression was carried out with IPTG, 
as for that of the domains alone. The characteristics 
of the proteins produced are summarized in Table 4 
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below . 



Table 4* 



Plasmid 


Chimera, 


Construct 


Size, 


Mass, 




abbreviation 




aa 


Da 




Fig. 4 








pET32a 


Thioredoxin; 
TrX 


lM-Ci89 


189 


20397 


pET32a- 


TrX-DP-TMEl; 


1M-S115-PK- 


171 


17796 


Gend-dp-Tl 


TdpTI 


Gend-dp-Ti 






pET32a- 


TrX-DP-TME2; 


iM-Sns-PK- 


165 


17481 


Gend-dp-T2 


T 3? T2 


Gend-dp-T 2 







* : Tl = TME1 and T2 = TME2 



The amino acids (aa) are indicated with the one-letter 
code. The numbering of the sequences is done with 
respect to the proteins of origin, GST and viral 
polyprotein. That which refers to the membrane domains 
is indicated in italics. "Gend" refers to the C-terminal 
sequence of the GST originating from the constructs with 
the pGEXKT plasmid. It corresponds to the primary 
peptide sequence SDLSGGGGGLVPRGS . The thioredoxin- 
SDLSGGGGGLVPRGS-DP- (TME1 or TME2) chimeras are shorter 
than the protein encoded in the vector of origin since 
the insertion is effected immediately after the 
thioredoxin . 

In Figure 5 : 

A: the bacterial growth was followed by measuring the 
increase in turbidity of each culture by optical density 
at 600 nm as a function of time. 

B: the bacteria were sampled as indicated for Figure 4. 
They were then loaded onto a Laemmli SDS-PAGE type 14% 
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acrylamide electrophoresis gel and treated as indicated 
for Figure 4. 

As expected, and as shown by the growth curves 
represented in the attached Figure 5A for the bacteria 
transformed with the various plasmids, expression of the 
TrX-DP-TMEl and TrX-DP-TME2 chimeras according to the 
present invention is not toxic. The Laemmli 14% SDS-PAGE 
[24] electrophoresis gel represented in the attached 
Figure 5B shows that each chimera is overproduced. 

The present invention therefore makes it possible to 
produce, by genetic recombination, hydrophobic peptides 
corresponding to the membrane domains of the El and E2 
proteins of the hepatitis C virus envelope, the 
expression of which was acknowledged to be lethal in the 
techniques of the prior art. In addition, since the 
effect is observed on two peptides that are really 
different and both initially toxic for the bacterium, 
this indicates that the present invention concerns other 
hydrophobic and toxic peptides. 

Example 6: Effect of the DP dipeptide on the toxicity of 
the TME1 and TME2 transmembrane domains expressed 
without fusion protein in the bacterium 

This example makes it possible to evaluate the antitoxic 
effect of the DP dipeptide inserted in the absence of 
GST or TrX fusion protein in accordance with the 
attached Claim 1. 

A) Materials: The pT7-7-pt IME i and pT7-7-pt TM E2 plasmids 
are those which are described in Example 1. The pT7-7- 
dp-ptjME: and pT7-7-dp-pt T ME2 plasmids were constructed and 
cloned in pT7-7 (SEQ ID NO: 33) as described in 
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Example 1, but using the Nde I (5') EcoR I (3') sites of 
the plasmid. The upstream (5') oligonucleotides 
integrate the dp sequence (gacccg) after the 1st 
methionine (atg) . The matrices used to generate each DNA 
were the pT7- 7-pt TME i and pT7-7-pt IME 2 plasmids. The 
sequences were verified after cloning. 

The oligonucleotides are as follows: 



i) Cloning of the sequence encoding (M) DP-TME1 in pT7-7: 

0L19 (+) : 5 ' -CG CATATG GACCCGATCGCTGGTGCT - 3' (Nde I 

underlined) = (SEQ ID NO: 15 of the attached sequence 
listing) ; 

OL2 0 (-) : 5 ' - GAATTCC TAAGCGTCAACACCAGC-3 ' (EcoR I 

underlined) = (SEQ ID NO: 16 of the attached sequence 
listing) . 

ii) Cloning of the sequence encoding (M) DP-TME2 in pT7- 
7: 

OL2 8 (+) : 5 ' -CG CATATG GACCCGGAATACGTTGTTC-3 ' (Nde I 

underlined) = (SEQ ID NO: 26 of the attached sequence 
listing) ; 



OL29 (-) 
underlined) 
listing) . 



5 ' -CA GAATTCC TAAGCTTCAGCCTGAGAG-3 ' (EcoR I 
= SEQ ID NO: 27 of the attached sequence 
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The pT7-7-dp-pt IM Ei and pT7-7-dp-pt TME 2 expression vectors 
obtained are given in the attached sequence listing (SEQ 
ID NO: 44 and SEQ ID NO: 45) . 

B) Legend of the attached figures 7A and B: the 
bacterial strain BL21 (DE3 ) pLysS was transformed either 
with the plasmid alone or with the various versions of 
pT7-7 integrating the 4 constructs expressing TME1, 
M-DP-TME1 (Figure 7A) , or TME2, M-DP-TME2 (Figure 7B) . M 
represents methionine; it is present at the N-terminal 
position of the peptides when the toxic proteins are 
produced according to the present invention with the 
pT7-7 plasmid. 

The growth of the various clones was compared after 
induction with IPTG, according to the protocol identical 
to the chemical induction described in Example 2, and 
averaged over the OD values of 4 different clones for 
each construct. 

C) Results: 

Figures 7A and 7B show that the bacteria that have a 
plasmid expressing TME1 and TME2 proteins grow less 
rapidly after induction than the control strain which is 
transformed with the pT7-7 vector alone. 

These results show that the strains transformed with the 
plasmids expressing the M-DP-TME1 (SEQ ID NO: 50) and M- 
DP-TME2 (SEQ ID NO: 51) versions according to the 
invention grow significantly better than those that 
express the TMs without DP. This is true for TME1, and 
even more clearly so for TME2. 



3 9 



The conclusion is that the N-terminal insertion of DP in 
accordance with the present invention contributes, 
surprisingly, to a significant decrease in toxicity of 
the expression of the membrane domains, in particular in 
the absence of a soluble fusion protein such as GST or 
thioredoxin . 
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